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Abstract 

We give a simple inequality for the sum of independent bounded random variables. This inequality 
improves on the celebrated result of Hoeffding in a special case. It is optimal in the limit where the 
sum tends to a Poisson random variable. 



1 Introduction 

Modern machine learning and stochastic programming are largely based on inequalities relating to the 
sums of random variables. Hoeffding [2] proposed several such bounds, which were in turn improved 
by Talagrand [5], Pinelis [4] and Bentkus pQ. In this paper we prove the following related result. 

Theorem 1. Suppose that S = $^"=1 %s a sum °f independent random variables with P(0 < X; < 
1) = 1 for 1 < i < n and E5 = A. Then 

P( S <l,<«{(, + A-i)(l-f (1, 

< max{l + A,e}e~\ (2) 
In this context, Hoeffding's inequality states that for all A > 1 

W(S<l)<\(l+——) . (3) 

Theorem[T]is not as general as Hoeffding's inequality since it only allows us to bound P(S < 1) rather 
than P(5 < t) for any positive t. However, from Theorem [1] we may derive Corollary [1] which states 
that 

P(5 < 1) < e 1_rES where r = 0.841405 .... (4) 
In contrast, the strongest such result that can be obtained from Hoeffding's bound is 

P(5 < 1) < e 1 "^"^ 1 )^ where 1 - e" 1 = 0.6321 .... (5) 

Thus Theorem [T] improves on the Hoeffding bound. 

It is interesting to compare our result with Theorem 1.2 of Bentkus pQ in the form of his in- 
equality 1.1. This states that for a sequence of bounded independent random variables Yi such that 
P(0 <Yi<l) we have 

f>i>z) < eV{B n > x) (6) 
i=i / 

where B n ~ binomial(p, n) with p := J27=i Eii/n. If we set X% := 1 — Yi and S = Y^t=iO- ~ ^») m 
order to match the random variables in our Theorem [T] Bentkus's result gives 

P(S < 1) = P (j2 Yi>n-l \ <e(p n + n(l - p)p n ~ 1 ) . 

If we set m := ES, so that p = 1 - 2, we have p n = (l - 2)" < e~ m and p + n(l - p) < 1 + m so 
that 

HS < 1) < -(1 +ES)e- ES . 
P 
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This bound is a factor of e larger than our result for all ES > e — 1. 
Furthermore, Theorem [T] is optimal in the following sense. If 

S ~ binomial ( — ,n 



n 



then 



¥(S < 1) = F(S = 0) + P(5 = 1) = I 1 - - ) + A I 1 



n 



corresponding to the first term in the 'max' of JTJ), while if 

' X- 1 
n-l' 



S ~ 1 + binomial I , n — 1 



then we get the second term in the 'max' 

¥(S < 1) 



A — 1 x n_1 



n-l. 

Similarly, in the n-independent form of our bound JSJ, if S ~ Poisson(A) then 

P(S < 1) = P(5 = 0) + P(5 = 1) = e _A + e _A A 

corresponding to the first term in the 'max' of ([2]). Similarly, if S ~ 1 + Poisson(A — 1) then we get 
the second term in the 'max' 

P(5 < 1) = e 1_A . 

While the sum of a finite collection of bounded random variables X^™=i cannot have a Poisson 
distribution, the law of small numbers implies that the Poisson distribution is the limit as n — > oo 
of the sum of a suitable collection of random variables (Xi)i=i $ 2,...,n- For instance if each Xi is a 
Bernoulli random variable taking value 1 with probability X/n and value otherwise, then following 
limiting probability mass function is Poisson 

lim P ( VXi = x | = e~ A ^- for x <E Z+. (7) 



2 Proof of Theorem [T] 

In this section, we define four families of random sums Sn,7~ n , U n and V n . Then we present Lemmas [T] 
13 13] and [S] that relate these families, and combine these results to prove Theorem [TJ 

The random variable considered in Theorem [T] is from the family S„ of random variables S of the 
form 

n 

S:=^Xi (8) 

i=l 

where Xi are independent random variables with Xi 6 [0, 1]. Family T n is the set of Bernoulli sums 
T of the form 

n 

T:=Y,Yi (9) 

where Y% are independent random variables taking values o» or 6i with o», &i £ [0, 1], for i = 1, 2, . . . , n. 
Family U n is the set of Bernoulli sums U of the form 

n 

(10) 

with each Bernoulli random variable Bj taking the value or 1. Finally, family V n is the set of shifted 
binomial random variables V with any parameter p £ [0, 1] and with either of the following two forms 

I Tip — 1 \ 

V ~ binomial(p, n) or V ~ 1 + binomial ,n-l . (11) 

V n- 1 J 

Lemma 1. For any random sum S € S n , there exists a random sum T £ T n such that ES = ET and 
HS < 1) < P(T < 1). 

Lemma[T]follows directly from Theorem 8 of Mulholland and Rogers [3], which we state as Lemma 2. 
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Lemma 2. For each integer i with 1 < i < n, let fn(x), . . . , fik(x) be Borel-measurable functions, 
and let Ki be the set of probability distribution functions Fi(x) : R — > [0, 1] satisfying 

f ij (x)dF i (x) = for j = 1,2,..., k. (12) 

Let Ei be the set of functions from Ki that are step-functions having Ki jumps at points xn, xa, . . . , Xi Kj 
where 1 < Ki < k + 1 and where the Ki vectors 

(l,fa(xij),...,fik(xij)) j = 1,2, ...,Ki 

are linearly independent. 

Suppose that g(xi, Xi, . . . , x n ) is Borel-measurable as a function of the point (xi,...,x n ) € R™. 
Then 

/oo r oo 

... / g(x!, . . . , x n )dF 1 (x 1 ) . . . dF n (x„) 
-oo J -oo 



sup 



/oo roo 
... I g(xi, . . . ,x n )dH 1 (x 1 ) . . . dH n (x n ) 
-oo J —oo 



provided the left-hand side is finite. 

Proof. See [3]. □ 

In the above Lemma, the conditions (|12l) can be interpreted as moment conditions on random 
variables Xi whose probability distribution functions are Fi, while the distribution functions in Ei 
correspond to random variables whose support consists of a finite set of Ki points and which satisfy 
conditions (|12[) . 

Lemma\7\ Let lc denote the indicator function for condition C and let Xi be the random variables 
defining S for 1 < i < n. In Lemma [2j put 

M(x) := l < x <i - 1, f i2 (x) :=x- EX, (13) 

which are both Borel-measurable functions. Then the set Ei of distribution functions corresponds to 
the set of random variables Zi which take on 1 < Ki < 3 distinct values, say at On,..., Zi Ki , which 
satisfy 

P(0 < Zi < 1) = 1 and EZ t = EX 4 (14) 

and for which the vectors (1, fa(Zij), fi2(zij)) are linearly independent for 1 < j < Ki. 

We now rule out the case Ki = 3, since if P(0 < Zi < 1) = 1 then the jumps must satisfy 
fn(zij) = lo< Zij <i — 1 = and there are at most two linearly independent vectors of the form 

(1, 0, Zij — EXi). (15) 

Thus the random variables Zi take on at most two values, say at, hi S [0, 1], and so the random 
variables Zi match the definition of the random variables Yi defining the sum T. 

Finally, if we set g(xi, . . . , x n ) := as 4 <ii which is Borel-measurable, and identify the distri- 

bution functions Fi(x) with those of the random variables Xi then Lemma [2] gives 

/oo poo 
... j g(xi, . . . , x n )dF 1 (x 1 ) . . . dF n (x n ) 
-oo J — oo 

since P (ELi Xi < l) e [0, 1]. This completes the proof. □ 
Lemma 3. For any T G T n there exists a U £ U\ U U2 U ■ ■ ■ U U n such that 

P(T < 1) < P(t/ < 1) and ET < EU. (17) 

Proof. We use induction on n. 

If n = 1 then we set U = 1, so that 

P(T < 1) = P((7 < 1) = 1 and ET < EU 

directly satisfying (|17|) . 
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If 7i > 1 there are several cases to consider, for which it helps to first rewrite T. Recall that 
T — X/2=i where Yi G {a,i, bi} and < a* < bi < 1. Thus we may write 

T = a + J2 c ^ B ^ ( 18 ) 

i=l 

where a := Y17=i a > — 0' Ci := bi—a,i G [0, 1] and where Bi are independent Bernoulli random variables. 

In the first case, if a > 1 then we put U = X]™=i ^ with P(B{ = 1) = 1 for all i. Then 
P(T < 1) = P(E/ < 1) = and ET < J2?=i k<n = EU, satisfying {T7J . 

Secondly, if a + Cj < 1 — a for some pair i, j with i ^ j, then consider the sum 

5:=Xy+ ifc where Xy :=Fi + ^. (19) 

fce{l,2,...,n}\{tj} 

We have 

= a% + aj + CiBi + CjBj < a + Ci + Cj < 1 

for all realizations of Bi, Bj. Thus S G <S n -i and we can apply Lemma [T] to show that ET = ES* < ET' 
and P(T < 1) = P(5 < 1) < P(T' < 1) for some T' G 7^-i. The Lemma then follows by induction. 

Otherwise, we have a G [0, l],c* G [0, 1] and Cj + Cj > 1 — a for all i and all j 7^ i. The key 
observation is that the latter condition implies that 



S (T< 1) =P (^Bi < l^Bi = j 

Vise ieo / 



where C := {i : Cj < 1 — a}, D := {£ : a > 1 — a}. 

If E«gc < 1 and C ^ 0> then we put 17 = 1 + E i6 D noting that (7 G U" =1 W n , giving 

P(t/< 1) =P (^Bi = 0] 

> p [J2 B * - ^Z)- 8 * = °) = p ( r < 1) 

and Et7 = 1 + ^ EBi 

> a + (1 - a) V" EBi + V EB t (as V EB; < 1) 

> a + CiEBi + ^ CiEBi = ET 

iSC iG-D 

satisfying (fT7)l . 

Finally, if X^igc > 1 or C = 0, then we put U = noting that U G W„ so that 

P([/< 1) =1p(Z Bi + Z B ' - 1 ) 

VgC i6D / 

>P(^£* < l,^Bi = 0) = P(T<1) 

and Et/ = ^ EBi + ^ EB; 

iec leu 

> a + (1 - a) ^ EBi + ^ EB; 

iec leu 

> a + c i EB > + C * EB > = ET 

satisfying (|17|l and completing the proof. □ 
Lemma 4. For any U G U„ with n > 1, there exists a V G U^-iVm such that 

W(U < 1) < P(V < 1) anrf EI7 = EV. (20) 
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Proof. Let U := £™ =1 B i where B * 

are Rernoulli random variables, q% : — EBi and q : — (qi, Q2, • • • , q-n). 

We have 

P([/<l)=P^S i = 0j +P^Bi = lJ (21) 
= 11(1 -*)+!> II (!-*)=:£»(«)■ (22) 

i— 1 J— 1 i—l:rt,i^j 

Consider maximizing L n (q) over g G {[0, l] n | A = 53ILi 9*} noting that maxima might lie on the 
interior with qi £ (0, 1) for all 1 < i < n or on the boundary with qi £ {0, 1} for some i. Since L n (q) is 
a differentiable function of q, any critical point of L n (q) on the interior with q £ {(0, 1)" | A = Yn=i 
must satisfy 

V qk ^L n (q) + * j = for all 1 < fc < n (23) 

for a suitable Lagrange multiplier /j,. However, L n (q) is a symmetric linear function of each q^. So 
if n > 2 then any solution of equation (|23[) must have qu = for all k 7^ Z in l,...,n. Thus 
q = ())•••)„) f° r which U corresponds to the random variable V n ,o ~ binomial n) which is in 
V n and has EV n ,o = A. 

If qi = 1 for some i and n > 2 then the arithmetic-geometric mean inequality gives 



M«)= II (i-*)< fi-^rrj • (24) 

The right-hand side is P(Vk,i < 1) for the random variable V n ,i ~ 1 + binomial (^ji 71 — for which 
K,i £ V„ and EV„,i = A. 

If qi = for some i and n > 2 then the definition of L n (q) gives 

L„(g) = L„_i(g l ) where q l := (qi, . . . , gj_i, q i+1 , . ..,q n ). (25) 

However to have qi — for some i we require that A = 2™ =1 j-jj (?j < n — 1. 
In summary, if q £ {[0, l] n | A = X^T=i 9'} an ^ n > 2 then 



in(?) < 



fmaxi< 1 < n {L n _i(g 1 ),P(V; i ,o < l),P(V«,i < 1)} if < A < n - 1 
1 max{P(V„, < l),P(14,i < 1)} if n- K A< n, 



and for n = 1, consider the random variable Vi,i := binomial(A, 1) for which V1.1 £ Vi, P(f7 < 1) = 
P(Vi,i < 1) and A = EVi,i. Thus 

P(t/ < 1) < max P(V < 1) 

{v\veu™ n=1 v m ,EV=\} 

which completes the proof. □ 

Lemma 5. Let H n (X) := sup{P(V < 1) | V £ V„,EV = A} with the convention that sup0 = 0. Then 

H n (X) < H n ,{\') for all0 <\' < A and all 1 < n < n . (26) 

Proof. The definition of V n gives 

f 1 if < A < 1 

H n (X) = I max{F n (X),G n (X)} if 1< A < n (27) 
[0 ifn<A 

where for 1 < A < n 

F„(A) := P (binomial < 1 \ = (l - ^) + xfl-^\ (28) 

G„(A) := P + binomial (^^— I n - 1^ < lj = ^1 - ^— , (29) 

so let us collect some facts about F n (X) and G n (A). 

First, set x := 1 - A/n so we have n = A/(l — x) and 

logF„(A)=log( a; " + A*"- 1 ) = ^^ + log(l + ^ =:g(x). (30) 
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Now 



and 



il^V^) = log,* 1Z£ - ^ =: „(„) (31) 



Vm(:e) = ^tV (x2 + (A " 2):c + A * " A) - (32) 

Note that mm xeR (x 2 + (A - 2)as + A 2 - A) = (3A 2 - 4)/4. Thus if A > 2/V3 then Vu(x) < for all 
< x < 1 and so > u(l) = 0. Thus Vg(x) > 0, by (|3ip . so that logF n (A) is increasing in n, by 
(|30p and from the fact that n = A/(l — x) is increasing in x for fixed A. Hence 

F„(A) < F n +i(A) for all 2/V3 < A < n and n > 1. (33) 

Second, Taylor expansion gives 



A \ °° X k 

logG„ +1 (A) = nlog(l--)=-A-£— for 



< 1 (34) 



which is a non-decreasing function of n for A > 0. Thus 

G„(A) < G„+i(A) for all < A < n and n > 1. (35) 
Third, considering the range of A for which F n (X) < G n (A) gives 



a . n- 



^n+^ T^ = — i.) (37) 



A-l 

n-1 



n—1 



A< -. (38) 

\ n — 1 / n—1 

Applying the inequality log a; < x — 1 to x = 2=1 we see that nlog 2=1 < _l ; hence f^tj > e - So 
for n > 3 we have 

>e-~>4- 09) 



n—1 J n— 1 ~ 2 ^ 

Additionally G 2 (A) - F 2 (A) = (A - 2) 2 /4 > for all A 6 R. In conjunction with (O and ((39j this 
gives 

F n (X) < G n (A) for all A < and n > 2. (40) 

v3 

Now consider the function H n (X). The definition of H n (X) gives 

H„(A) = fl»+i(A) = 1 for all < A < 1 and n > 1 (41) 

= H n {\) < H n+1 (X) for all n < A < n + 1 and n > 1 (42) 

H„(A) = H n +i(A) = for all A > n + 1 and n > 1. (43) 

For all 1 < A < and all n > 2, J35]| and |gD) give 

ff n (A) = max{F n (A),G„(A)} = G„(A) < G n+ i(A) (44) 

< max{F n+ i(A),G„+i(A)} = H n+1 (X). (45) 

For all ^= < A < n and all n > 2, J33J and (J35J) give 

fT„(A) = max{F n (A),G„(A)} (46) 

< max{F l+1 (A),G„+i(A)} = H n+1 (X). (47) 

In summary 

fln(A) < -ff„+i(A) for all A > and n > 1 (48) 

showing that H n (X) is non-decreasing in n. 

Finally, VaF„(A) = -2=1 (l - < and VaG„(A) < 0, so H„(A) is non-increasing in A. 

This completes the proof. □ 
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Theorem\7\ By Lemmas[T] [3]and[4] there exist random sums T £ T n , U £ UU, =1 Wm and V £ U^iVm 
such that 

P(S < 1) < P(T < 1) < P(J7 < 1) < P(V < 1) and ES = ET < EU = EV. 

Say that V £ V m for some 1 < m < n and let Av := EV. Then Xv > X —: ES as just shown, so 
Lemma [5] gives 

P(V < 1) < H m (\ v ) < fln(A). 
Now, by definition of H n (X), for < A < n we have H„(X) = max{_F„(A), G n (A)} where F n (A) := 
(1 + A - £) (1 - I)*" 1 and £?„(A) := (l - , so that 

P(5 < 1) < max{F, l (A),G„(A)} (49) 
which proves |T}. Furthermore, Lemma [5] gives 

H n (X)< lim max{F m (A),G m (A)} = max{l + A, e}e" A (50) 

m— f oo 

which completes the proof. □ 

3 Application 

If we wish to bound the expectation of a random sum, then Theorem [1] can be conveniently rearranged 
as follows. 

Corollary 1. Suppose that S = X]"=i ^ i * s a sum °f independent random variables with P(0 < Xi < 
1) = 1 for 1 < i < n. Then for r = 0.841405 . . . , we have 

¥(S < 1) < e 1 ^^ 5 or equivalently ES < - (1 - logP(S < 1)) . (51) 

r 

Proof. We work with the right-hand side of Theorem [T] to find the smallest a such that 

r i , 1 — m 1— m+am 

maxje, 1 + mje < e 
for all m > 0, or equivalently, such that 

log(l + m) — am < 1. 

For fixed a > the left-hand side is concave with a unique maximum at m = - — 1. Substituting this 
m, we require that 

a — log a < 2. 

Now the function a — log a is decreasing for a < 1, thus we require that a > ao where ao is the root of 
ao = e a ° -2 having oo < 1. A fixed point method yields the solution ao = 0.158594 • • • = 1 — r. □ 
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